Individual Poster Page

See copyright notice at the bottom of this page.

List of All Posters

 


DIPS year-to-year correlations, 1972-1992 (August 5, 2003)

Discussion Thread

Posted 10:14 p.m., August 7, 2003 (#64) - Alan Jordan(e-mail)
  Warning retread -

Tango invited me to comment on this thread last night. It took a couple of hours to read through all the posts, and check a few of the simululations.

Here is another way of looking at the test-retest correlation in terms of what it’s supposed to measure Player’s ability vs change in ability and how the two should be related to the Rsqr. Of course R is simply the square root of Rsqr.

Part 1
Rsqr=Variance of Player’s ability/(Variance of Player’s ability + Variance Change in ability from year to year).

The above assumes that player’s ability and change in ability from year to year are independent (unobservable ability, not observable performance). It seems reasonable at the moment and I can always generalize it if need be.

I can’t give you a mathematical proof, but I would start it by assuming that change in ability is the error and ability is the model. I can give you this for those of you who are programmers and have some software to stats.

Step1 Generate a variable called X with a variance of 4 (don’t worry about the distribution). Generate in the same step a variable called err with a variance of 9. Create a variable called y as the sum of X and Err.

Step 2. Calculate the Rsqr between X and Y and it will be close to .3.
.3=4/(4+9)

Part 2
What does this mean to FJM’s denominator hypothesis. An event ratio with a small probability (or a probability far above .5) will have a small variance because in binary data the probability and the variance are related - Variance of p=p*(1-p). That small variance will cause a small r. Since singles have a higher p they should have a higher variance and hence a higher r.

Tango – what is the correlation of the logit of these events for each year (natural log of the odds ratio). Is that doable for this weekend? If the variance is proportional to the probability of the event then transforming these data will remove that and get us a better picture. I.e. the r for 1B/PA may be higher than the rest of the other r’s strictly because it has a higher probability.

Erik Allen -

You said -
The standard deviation is given by
STD = sqrt(n*p*(1-p))

This is wrong. STD =Sqrt(P*(1-P)). N doesn’t come in to play.

The standard deviations are:
1BSTD= 0.4
xbhSTD= 0.3

but your conclusion is correct:

“If you happen to locate a statistic that displays a HIGHER year-to-year correlation, …, then this would seem to imply that the differences in player ability outweigh the variability of the statistic.”

Tango-

Anyway, for these 20 pitchers, here are their year-to-year r
2b: .18, 1b: .47, out: .11
Wouldn't we have expected the out, with the highest numerator, to have the highest r, based on your previous explanation?
Don’t read too much into these you have a sample size of 20. If you don’t have a sample size of 20 then you probably did something wrong. My hunch is that even with 1000 pitchers of equal ability, you’re correlations will be insignificantly different from zero.

Erik Allen-
Corr = sum over i [(x_i-x_avg)*(y_i-y_avg)]
You subtracted the mean, but forgot to divide by the standard deviation. The Corr is a covariance of standardized variables. What you have is a covariance of centered variables.

“In your first simulation, all 20 pitchers should have the same ability. Therefore, if pitcherX were ABOVE average one year, we should not expect him to be ABOVE average the second year, and I would think that corr=0 for a sufficiently large sample.”

I think that’s a good insight.

“range of 0.09 to 0.11. So, on a relative basis, these are the same ranges. The correlation coefficients here are:
1B = 0.46
xbh = 0.28
So, from here we can see that there is significantly less predictability in xbh rate, despite the fact that the relative variation in the statistics is approxiamtely the same.”

I ran your simulation and got about the same numbers you did, but I can’t find a transformation that will equalize the r’s. I tried logit, ln and squareoot. Even a non parametric correlation didn’t do the trick. Beats me.

Tango-
You said:

“Will the larger spread in talent among pitchers allow us to get an r to approach 1?”
Absolutely.

FJM-
You said:
“But that assumes every 0.20 pitcher remains a 0.20 pitcher, every 0.18 pitcher stays right there, and so on. How realistic is that? Well, if the range of abilities is very narrow, then the chance of any pitcher greatly improving (or worsening)is very remote. But if the range is very wide, significant changes in year-to-year ability are certainly possible.”
No, the abilities and changes in abilities are assumed to be independent.

“So you can get a small r in either of 2 ways: 1)very small differences in true ability among pitchers with a lot of random variation, or 2)large differences in true ability accompanied by large year-to-year variation in that ability for individual pitchers”
Exactly

Tango-
You said:
“ To recap, the year-to-year r is dependent on:
1 - how many pitchers in the sample
2 - how many PAs per pitcher in year 1
3 - how many PAs per pitcher in year 2
4 - how much spread in the true rates there are among pitchers (expressed probably as a standard deviation)
5 - possibly how close the true rate is to .5
6 - the true rate being the same in year 1 and year 2”
All are true but number 1. The number of pitchers effects the standard error or the precision of our estimate. With only 20 pitchers, our estimate of r might be too high or too low, but r itself remains unchanged.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 12:23 a.m., August 29, 2003 (#4) - Alan Jordan
  I made a little approximation equation using the data provided. I have no theoretical knowledge of how exactly this model should be specified so I just did a logistic regression. I looked at interactions and even though they were significant, they didn't add much to the predictive power so I dropped them. I even dropped the Inning variable because It's so correlated to the difference in runs. With this huge sample size it had a p value of only .01 and when I rounded it off to two decimal points it was .00 so why bother with adding it to the model. Anyway, here it is.

WE=exp(LF)/(1+exp(LF))

lf=
0.58 +
HOME *0.5 +
DIFRUNS *0.7 +
OUTS * -0.18 +
SIT2=1 *-0.66 +
SIT2=2 *-0.5 +
SIT2=3 *-0.41 +
SIT2=4 *-0.28 +
SIT2=5 *-0.34 +
SIT2=6 *-0.21 +
SIT2=7 *-0.1

(Sit2 is situation but only goes 1-8, outs have been recoded into another variable).

The area under the ROC Curve is .83 which means that if you had the WE from this model you would be right about 83% of the time. Of course it's biased upwards a little. The more complicated models I tried didn't get above .84.

If anybody has any better ideas let me know.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:28 a.m., August 29, 2003 (#6) - Alan Jordan
  Here's a model for the ninth inning

LF=
1.2898
HOME *0.5231
Diffruns*1.4595
OUTS *-0.3738
SIT2=1 *-1.2722
SIT2=2 *-0.9408
SIT2=3 *-0.7069
SIT2=4 *-0.4566
SIT2=5 *-0.5013
SIT2=6 *-0.2663
SIT2=7 *-0.106

The interaction between sit and difruns adds nothing substantial at all to the predictive power according to this data set and the models. Both models have areas under the curve of .93. Sure the model with the interaction would have a slightly higher area under the curve, but you would have to go to the third decimal point to see it. It's not worth adding six more terms to your model. Remember that this model is essentially multiplicative, not like a linear regression which is additive.

If you did a model like this for each inning the area under the curve would be .84. Maybe you care more about the late innings, I don't know.

Yes Sit2=8 is multiplied by 0.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:17 p.m., August 29, 2003 (#9) - Alan Jordan
  O.K. Tango first what do mean by within 3 runs?

if abs(diffruns)=0 and diffruns <=3 then w3=1; else w3=0;

the first groups games into close or not, the second groups games into (tied or small lead) vs (big lead or behind by at least one run). Both of these lumping of diffruns cause the predictive ability to drop off steeply.

****************************************

Studes

The basic model is the logistic function

WE=exp(LF)/(1+exp(LF))

where LF is a linear function, i.e. straight line equation. exp(LF)/(1+exp(LF)) bends the straight line into an s shaped curve that can never quite hit 1 or 0. For those of you familiar with odds ratios the model can also be expressed as:

WE/(1-WE)=exp(LF) or

ln(WE/(1-WE))=LF

The logistic regression is a generalization of the additive linear regression model, but because all the coefficients are actually exponents of e, its really a multiplicative model.

exp(m+n)=exp(m)*exp(n)

as for the last model I posted which was:

LF=
1.2898
HOME *0.5231
Diffruns*1.4595
OUTS *-0.3738
SIT2=1 *-1.2722
SIT2=2 *-0.9408
SIT2=3 *-0.7069
SIT2=4 *-0.4566
SIT2=5 *-0.5013
SIT2=6 *-0.2663
SIT2=7 *-0.106

only diffruns and outs are continuous variables. All of the others are dummy variables (0,1). Home is 1 when its the home team and 0 when its the visiting team. SIT2=1 is 1 when situation =1 and 0 otherwise. If you have K groups then you need K-1 dummy variables. If sit2=1 - sit2=7 all =0 then logically situation=8. Therefore there is no reason to create a dummy variable for sit2=8. It actually causes problems with the matrix algebra if you do.

Now for an example. Suppose its the ninth inning (model only works for the ninth inning) and the home team is has a man on 2nd and 3rd with 1 out and they are behind by 2 runs.

since its the home team home =1 and since they are behind by 2, diffruns =-2, outs=1, and "sit2=7"=1 because we have a man on 2nd and 3rd. All other sit2 variables must =o

1.2898
1 *0.5231
-2*1.4595
1 *-0.3738
0 *-1.2722
0*-0.9408
0 *-0.7069
0 *-0.4566
0 *-0.5013
0 *-0.2663
1 *-0.106

LF=-1.59

and

WE=.17

unless I screwed something up.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 3:14 a.m., August 31, 2003 (#10) - Alan Jordan
  O.K., I see what you're doing Tango. Here is an equation that will allow you to compare WEs for your table. I have already compared them. I merged your WE's, Mine and the actuals. I estimated number of games won for both systems by multiplying the WE by the number of games. That way scenerios with 7,000 games got more weight than those with 50. I then calculated discrepancies as
abs(estWE-ObservedWE) for both systems. Yours had 16,973 discrepancies and mine had 12,064. This is from a base of 156,857 games played for whatever that tells you.

Here is the model that only works for the 7th, 8th and 9th innings.

LF= 1.0298 +
HOME* 0.5714 +
OUTS* -0.2929 +
SIT2=1* -1.0464 +
SIT2=2* -0.7885 +
SIT2=3* -0.6092 +
SIT2=4* -0.4281 +
SIT2=5* -0.5105 +
SIT2=6* -0.2745 +
SIT2=7* -0.1106 +
INN=7* -0.0138 +
INN=8* 0.0365 +
DiffRuns* 1.4561 +
Diffruns*INN=7* -0.5995 +
Diffruns*INN=8 *-0.3652 ;

If you want want the table line by line, let me know. I'll probably have to email it to you.

I would print them out line by line, but your table is 334 lines long, not counting repeated headers.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:17 a.m., August 31, 2003 (#12) - Alan Jordan
  It would be better if you did it. That way you can check to see if I screwed anything up. You never need to my ask permission to post something like that. I've obviously put it out for public consumption.

I thought top and bottom of the inning reflected homefield advantage. Is that wrong because that's what I went off of.

As for any formal comparison of the fit of these two models, data from later years should be used. The fit from mine is biased and if your model was built off of data from these years, it's probably biased as well.

My winter project is to take the 2002-2003 play by play data and see if I can make a comparison of closers based on the number of men on, outs, inning, park, home plate umpire and strength of hitting.

It will be a logistic model like this except that it will also include terms for parks, umps, and opposing teams. The dependent variable will either be runs allowed or probability of a save. The closers can then be ranked by their coefficients in the model. This was a warmup of sorts for that so thanks to you and Phil for the free data.

P.S. if anyone else wants to tackle this project feel free to steal the idea. The hard part is separating the closer from the defense since closers don't rotate teams during the season. The idea of some kind of dips adjusted model is daunting and I may just stop without separating the pitcher from the defense.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:09 p.m., September 3, 2003 (#15) - Alan Jordan
  O.K., it took a while to get everything straightened out, but I have been able to verify the numbers for both my estimates of WE and Phil's empirical estimates based on the raw numbers. Both check out fine by me. Given the set of scenerios here which is different from the first, your estimates have fewer discrepencies than mine, 8,929 to 13,908.

I'm confused as to why you say that your model doesn't factor in home field advantage, yet you have home and away on your table.


Empirical Win Probabilities (August 28, 2003)

Discussion Thread

Posted 10:49 p.m., September 3, 2003 (#17) - Alan Jordan
  Got it.



Sabermetrics Crackpot Index (August 29, 2003)

Discussion Thread

Posted 11:11 a.m., September 1, 2003 (#17) - Alan Jordan
  100 points for claiming that the people who secretly run baseball will change all the rules once they understand the brilliance and the irresistable new philosphy of your ideas.

25 points for writing everything from your mother's basement.


Road Warriors (September 4, 2003)

Discussion Thread

Posted 11:24 p.m., September 4, 2003 (#7) - Alan Jordan
  I know I'm going to get myself in trouble for saying this, but here goes.

I wouldn't recomend using the Pyth or any variation of the Pyth in your formulas unless you just don't have game level data. If all you have is seasonal total data then go ahead use the pyth and ignore the rest of this message. If you have game level data and want to use the pyth then remove the effects of home field advantage, parks and starting pitchers (possibly plate umps if you want) and then use the Pyth.

The reason is that the Pyth and any function that uses seasonal totals is inefficient and in some cases biased when data from distributions are mixed together. The reason is that any function that uses seasonal total data whether its runs scored, runs allowed, HRs, GIDPs, whatever, treats all of the runs scored etc... as if they are equal, i.e. come from the same distribution. Games played in high scoring parks such as Mile High in Colorado produce higher runs scored and allowed than games played in the Dodgers' home park. This should be adjusted before you plug it into the Pyth. It's complicated and it requires game level data, but it's better than just taking a run estimator with season total data and plugging the estimated runs scored and runs allowed directly into the Pyth.

Again, if all you have is season total data, go ahead, you're pretty much stuck with that.


Road Warriors (September 4, 2003)

Discussion Thread

Posted 9:04 p.m., September 5, 2003 (#9) - Alan Jordan
  The park factors are probbably the biggest problem in terms of bias or imprecision (inefficiency in stat jargon) of estimates, but adding starting pitchers to your model along with teams, opps homefield ad, while a huge pain in the ass, increases the precision of the estimates and allows you to make predictions conditionial on the starting pitcher which might be a little more realistic. Of course you have to assume a rotation for each team.

For simplicity's sake, there's not much harm in ignoring the starting pitchers under the assumption that they are randomly spread throughout the season.



By The Numbers - Sept 7 (September 8, 2003)

Discussion Thread

Posted 9:20 a.m., September 9, 2003 (#3) - Alan Jordan
  What alpha are you refering to? The alpha from Skiena's formula for probability of one jai alai player beating another or are you talking about Cronbach's Alpha for internal consistency or some other alpha?


By The Numbers - Sept 7 (September 8, 2003)

Discussion Thread

Posted 6:32 p.m., September 9, 2003 (#10) - Alan Jordan
  Reno -

David Massey has fairly up to date game by game data for 2003 at

http://www.masseyratings.com/data/mlb.gms

Without having read the book, I can't tell you how Skiena derived his alpha exactly. However, when I do this sort of work I generally use nonlinear least squares. You can also also use reweighted nonlinear least squares or if you have the programming skill and the likelihood function at hand, maximum likelihood. I have SAS so I can use nonlinear least squares in proc nlin. If you don't have SAS or SPSS or some stat package that can do nonlinear equations then you have to have to know how to program it.

As an aside, it would appear than Skiena's formula needs to be generalized to accomadate other factors such as parks, homefield advantage and leauge average, not to mention multinomial outcomes.

As for BP's 3rd order win%, I wouldn't rely too heavily on anything that uses aggregated season total runs or events such as hits because strength of schedule and park factors can't be removed. Davenport doesn't explain what he's doing on 3rd order. He explains 1st and 2nd order and they are definitely season aggregated.

I pretty much slammed Davenport's system in a thread called "Tigers winning percentage inflated?" at fanhome.com. It retrospect I was probably a little too harsh on him, but I was appalled that someone who I thought had access to game level data wasn't taking advantage of it. Rob Neyer isn't any better and I KNOW he has access to game level data, but he still uses the pyth with seasonally aggregated data. All the others I have seen that use event data such as hits seem to use seasonally aggragated data. The problem appears to be that there is no current and up to date source that has game level data with hits that people can use to build better models. Once a source such as this becomes available, Davenport's 1st and 2nd order winning percentage models will be truly obsolete.


Pitchers, MVP, Quality of opposing hitters (September 19, 2003)

Discussion Thread

Posted 10:33 a.m., September 22, 2003 (#9) - Alan Jordan
  Tango -
"As long as the distribution of where they pitch can be explained by random chance, then we don't need to consider the park factor. I think that the Central Limit Theorem would apply (though don't quote me on that)."

It's not a question of central limit theorem. I've noticed that when you invoke the central limit theorem what you usually mean is "when the sample gets large enough". I think that's what you mean here. The central limit theorem relies on the later, but it's not the later itself.

The question of whether park factors are necessary hinge on two things.

1. How important are the park factors - the larger they are, the more likely you need to deal with them.
2. How evenly distributed are the park appearances for the pitchers - this is where your comment comes into play. Ideally with enough starts randomly scattered across the parks no park adjustment would be necessary because the appearances would be approximately evenly distributed among parks. Unfortunately this would take far more than the 30- 45 starts that pitchers get. It would probably take a couple of hundred.

The idea of randomness is that with a large enough sample you get eveness of distribution, but it's much more efficient to do non random distribution if you want eveness. For example have a pitcher start one game and only one game in each park. Then park factors would be unecessary.

The best resolution of this question is to handle it on a play by play basis. That way you can factor in hitter, pitcher, park, balls in play and event.


Results of the Forecast Experiment (October 2, 2003)

Discussion Thread

Posted 11:08 a.m., October 3, 2003 (#3) - Alan Jordan
  I did paired t-tests and none of them were significant because of the small sample size. A couple marginal around p<.15 I think, which hints that with a larger sample some might have been significant.

I would suggest a larger sample of hitters and pitchers next year, say 40 each.

Of course even if you only do 30 total next year, the results can be combined for a larger sample size.


Injury-prone players (October 14, 2003)

Discussion Thread

Posted 6:23 p.m., October 14, 2003 (#13) - Alan Jordan
  "Do a matched-pair study. That is, you have 2 groups that are equals in terms of:
- age
- position
- body type
- performance level"

Forget about matched pairs. You have to break continuous variables into discrete levels (i.e. age becomes 18-25, 26-30, etc...), you lose cases because they can't be matched and then you have arguments over what's a pair in the first place.

Go back to

Days on DL = x + y, where
x = 31 if injury prone
y = 1.3 * (Age - 23)

and add dummy variables for positions (X is a dummy variable).
You can add variables for body types (if you have that data) and performance. Also if you think that catchers wear out faster than other position players you can a slope dummy for catchers where cage=0 for all positions except catcher where cage=age. This allows age to have a different effect for catchers on the number of days on the dl (cage can also be called an interaction between position=catcher and age). You don't have to throw out any cases unless you think there is a group that is theoretically problematic.

You can also try nonlinear transformations of age such as the square, sqrt, log and inv to see if the effect of age increases/decreases per year as the players get older.

Show the t values or p values for your equation so people can tell if its just chance. I can't imagine that a coefficient of 31 isn't, but what about 1.3 for age?


Injury-prone players (October 14, 2003)

Discussion Thread

Posted 12:02 a.m., October 15, 2003 (#18) - Alan Jordan
  Tango- The "x"/"std error" was about 3.5 for the "31" and less than 1 for the "1.3".

They are standard deviations though they are usually referred to as t values. The shape of the t distribution has a different shape dependending on the number of cases you have. As your number of cases becomes larger the t distribution becomes the z distribution.

I'm somewhat surprised that your stat program doesn't provide a significance level. If you have a table of t values and have a little practice using it, you can translate your t values into significance levels or p values. If you don't have a table there are some rules of thumb to help.

If absolute value of t is greater than 2 then p<.05
if absolute value of t is greater than 3 then p<.01

according to what you're posting, being injury prone is significant even controlling for age at the p <.01 (actually p<.001 here), while age isn't significant. Of course if there is a curvelinear relationship between age and being on the dl then linear regression is going to underestimate the relationship here because age is specified as a linear effect but that's kind of piddling here because the mean difference was only one year with two groups of 50 cases.

Basically the sample here implies that injuries are more a function of individual players than age.


Injury-prone players (October 14, 2003)

Discussion Thread

Posted 12:22 a.m., October 15, 2003 (#19) - Alan Jordan
  FJM - You are mixing two different phenomena here: frequency and severity of injury. Frequency should be more predictable than severity. You don't want to treat a player with 3 different visits to the DL totalling 90 days the same as one with a single, 90-day layoff.

It wouldn't hurt to run the analysis both ways. In fact that's very often done. The researchers might present both results or summarize one in the footnotes.

It's an empirical question but my guess is that severity and frequency are correlated. People who miss work often also tend to be out for longer periods. That's a different process from baseball because motivation tends to push people away from work but baseball players towards playing. Anyway there is a theoretical justification for trying to predict days on the dl.

There is also another reason for doing days on the dl instead of number of trips. Number of trips to the dl is discrete (0,1,2,3...). Since most players will have 0, some will have 1 and a smaller group will have two etc... it will be difficult to get a high r-square and more difficult to get significant p values because of the low amount of variance (everyone bunched towards 0) and having a discrete dependent variable (discrete dependent variables tend to have lower r-squares) Also the justifying assumptions of linear regression tend to break down when you have discrete dependent variables which can cause your significance levels to be wrong. Being anal about this I would model number of trips to the dl with either a poission regression, negative binomial regression or an ordinal logistic regression and see which fit better.


Injury-prone players (October 14, 2003)

Discussion Thread

Posted 12:26 a.m., October 15, 2003 (#20) - ALan Jordan
  Jim - A related issue I'd love to see studied is whether there are any injury patterns on the team level. Do some teams have consistently more injuries than others, at least more than would be expected by chance?

I think that would be the most interesting question of all. If we had the data for several years for players and teams, we could see if teams are killing players and/or trading for known deadwood.


Anatomy of a Collapse (October 15, 2003)

Discussion Thread

Posted 5:08 p.m., October 15, 2003 (#15) - Alan Jordan
  Damn!

Good work Tango. That's a great use for WPA that I wouldn't have thought of. Of course the fan will always be blamed because its easier to lock on to an anaecdote than to weigh facts.

By the way what is the fan's wins above replacement fan?


Anatomy of a Collapse (October 15, 2003)

Discussion Thread

Posted 10:05 p.m., October 15, 2003 (#25) - Alan Jordan
  Craig B -
This fan was at the game, but was also -.031 wins above average for the play at the railing. So his Wins Above Replacement Fan was -.03099, I think. :)

Impressive, but can you calculate how much the Cubs should pay him not to come to a game? :)


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 11:15 p.m., October 16, 2003 (#4) - Alan Jordan
  I will agree that the postseason isn't guarenteed to select the "Best" team because of the heavy role of chance in short series and the arbitrary grouping of teams (The Twins got in while the Mariners stayed at home). It's easy to argue mathematically that it's not the best way, repeated postseason success when it happens can be considered evidence of a quility team. Its hard to argue that the Yankees of the late 90's and early 00's weren't/aren't the best team in baseball during that period. Their postseason success exceeds the expected success that teams in the postseason would get assuming they were all equal. I could also argue that the Braves have been the best in the National leauge, but not baseball Between 91 and now based on their postseason success.

There are lots of games/contests where the main objective is to win and players have to deal with arbitrary conditions that don't necessarily reward the best overall performance then we have to decide whether they won by superior skill/strategy or luck. War, poker, presidential elections (a candidate can win the majority of votes but still loose) are a few examples. There are plenty of others.

Determining who is best or who played best is somewhat problematic, but long term success should indicate that it's probably not luck.

Yanks winning the World Series after having only won 87 games - probably luck.

Yanks winning 4 out 7 World Series - probably not luck.


Relevancy of the Post-season (October 16, 2003)

Discussion Thread

Posted 11:16 p.m., October 17, 2003 (#11) - Alan Jordan
  Of course its semantics. Semantics allow someone to define best as winner of the World Series or Pennent of a leauge. Of course, everybody else is free to disreguard that definition and use another. Not arguing with you David, in fact I think the management of the Braves looks at it the way you do when they put a team together every year. I'm sure they compare themselves with the Yankees and feel woefully inferior despite what they say publicly. I think they would be quite happy with a loosing record for the season if it were just good enough to get them the wild card and they went on to win the world series.

Champions is a good term to distinguish the team that acheived the objective of the season from the theoretically murky and unobservable best team that would have won in infinite set of games.

"I don't know what the "best team" means if it isn't the team that won."

If you define the best team as the one who would win the highest percentage of games in a balanced schedule of infinite games and assuming that strength doesn't change over the course of the season, then there is still no guarentee that 160 games in an unbalanced schedule will determine who is best during a year. With a definition like that then the quality of a team is unobservable and can only estimated by observable variables such as wins, runs allowed, runs scored etc... In fact with a definition like that and considering that major leauge sports try to keep the talent level between teams equal, there may not even be a best team although there are cleary groups of teams that are better than others. One of the requisites for proving a difference between teams is that a difference exist in the first place. If one team always won, then nobody would watch the games. It is the competitive balance that keeps people interested in the games.

I wouldn't pronounce the post season irrelevent or strictly ornamental. It may not tell you the best team for the season, but the standings, computerized polls and simulations can't guarentee to do that either.


Results of the Forecast Experiment, Part 2 (October 27, 2003)

Discussion Thread

Posted 8:05 p.m., October 27, 2003 (#31) - Alan Jordan
  "The baseline forecast is very simple: take a player's last 3 years OPS or ERA. If he was born 1973 or earlier, worsen his OPS by 5% or his ERA by 10%. If he was born 1976 or later, improve his OPS by 5% or his ERA by 10%. The 1974-75 players will keep their 2000-2002 averages."

I missed that on part one. I was thinking the monkey was just last years OPS or something. If this is a monkey, it must have its own library card and bifocals. No wonder the monkey beat more than half of the readers. This is obviously the Warren Buffet of monkeys.


Results of the Forecast Experiment, Part 2 (October 27, 2003)

Discussion Thread

Posted 9:50 p.m., October 27, 2003 (#35) - Alan Jordan
  Thanks Studes


Results of the Forecast Experiment, Part 2 (October 27, 2003)

Discussion Thread

Posted 4:56 p.m., October 29, 2003 (#69) - Alan Jordan
  Micheal - "It may well be the case that naive (or sophisticated-naive for a tangotiger monkey) algorithms do really well when there is a lot of uncertainty, but when things are fairly predictable they may underperform scouting or educated guesses."

There are two kinds of uncertainty - 1. where the underlying system stays the same, but there is a random noise to the data. 2. Where the system changes.

Algorithms shine in the first type and fail miserably in the second.


Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 11:06 p.m., November 28, 2003 (#3) - Alan Jordan
  The regressing of net winshares by salary and then taking the residual seems completely unncessary. Multiply Wins Shares by $300,000 to put them in terms of dollars and then subtract salary and you have net value added to the team. You could also divide win shares by salary or wins above replacement by dollars above replacement. Any of these will give you a valid version of productivity in relation to salary. If I read the article correctly, this was done so that we could evaluate GMs, but since these methods give you productivity in relation to dollar, you're already there before you do the regression.

Also if you define value as benefit-cost or benefit/cost, you shouldn't be running regressions with value as the dependent variable and cost as the independent variable. Cost is explicitly stated in the dependent variable (value) and this regression will produce always produce a negative r by definition.

For example, I took this data created a normally distributed random variable with a mean of 0 and standard deviation of 1,000,000 and then substracted salary from it. The correlation between this number was -.95.


Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 10:06 p.m., November 29, 2003 (#8) - Alan Jordan
  I listed out three ways of doing it, the first you already did and one was Tango's. I don't know if one is any better than the other.

The way you have defined value will by definition give you a positive correlation between win shares and value. At the same time there will also be negative correlation between salary and value. Also if you put winshares and salary into a regression to predict value, you should get an r-square of 1, meaning perfect prediction.
every correlation implies a linear equation like Y=M*X+B+E.

Y is your dependent variable
M is the slope
X is the independent variable
B is the y-intercept or constant
E is the error (all omitted or mispecified variables)

in this case X is salary and E is winshares. If winshares and salary were uncorrelated (this isn't true), then M would be 1 and B would be 0.

As for how to get what you want, I suggest you take the team data that has winshares and salary and fit a logistic (or probit) regression through it. The logistic function has the nice property of the predicted value not going below 0 and not going above 1 (it has to be between 0 and 1). It follows an S like curve that is usually approximately linear between .3 and .7. This is probably what you want to look at because you should get progressively less increase in wins as you spend more.

I took a look at 2003 data from espn and correlated salary to win percentage. The logistic function only slightly outperformed the linear so I went with the linear. I did a linear regression where win percentage was the dependent variable and salary was the independent variable. I then created a varable called GM which was simply the residual , win percentage-predicted win percentage.

Oakland came out on top, followed by Toronto, Florida was 3rd and Atlanta was 4th. Detroit outsucked NYMets by a 9% wp margin for the tittle of salary misallocation champions. Here is the whole table.

Obs team gm

1 Oakland 0.12204
2 Toronto 0.09450
3 SanFranc 0.08703
4 Florida 0.08588
5 Atlanta 0.07297
6 Minnesot 0.05571
7 KansasCi 0.05047
8 Montreal 0.04649
9 Seattle 0.04299
10 Houston 0.03818
11 Boston 0.03366
12 ChicagoS 0.02888
13 Philadel 0.02650
14 ChicagoC 0.01347
15 Arizona 0.00585
16 Pittsbur 0.00340
17 St.Louis -0.00113
18 Milwauke -0.01708
19 NYYankee -0.01852
20 Anaheim -0.02613
21 Colorado -0.03049
22 TampaBay -0.03357
23 Clevelan -0.04832
24 LosAngel -0.05128
25 Cincinna -0.05947
26 Baltimor -0.06183
27 SanDiego -0.06843
28 Texas -0.07044
29 NYMets -0.11967
30 Detroit -0.20166

I don't fully trust the salary data from ESPN for a couple of reasons, 1st it was opening day so if a player was traded, his salary was attributed completely to his first team which probably under represents the salary of teams like the Yankees. 2nd Mike Hampton's 12 mil salary was attributed entirely to Atlanta even though Colorado and Florida were paying most of it this year. So who knows how accurate it is.

Anyway you can take your team winshare and salary data and do the same thing. If you post it on your site, or at fanhome, I'll run it for you.


Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 10:19 a.m., November 30, 2003 (#11) - Alan Jordan
  "Cost is explicitly stated in the dependent variable (value) and this regression will produce always produce a negative r by definition."

That's wrong. That should read:

"Cost is explicitly stated in the dependent variable (value) and this regression will produce always produce a MORE negative r by definition."

A negative r is only guarenteed when when benefit and cost are uncorrelated, and that's an extremely abnormal scenerio.

Actually your regression of value on salary may have a use as a test of market efficiency. Let me think on this.


Baseball Graphs - Money and Win Shares (November 28, 2003)

Discussion Thread

Posted 10:54 p.m., November 30, 2003 (#12) - Alan Jordan
  O.k. If the correlation between value (productivity -cost) and cost is negative then should mean that people are on average overpaying for productivity. If its positive then people are on average underpaying. If its 0 then people are paying the right price.

Statisticians will cringe at having cost on both sides of the equation, but in this case, I think it's o.k. In general avoid it if you can.

As for linking my quotes to your site, I wasn't sure where to post it anyway. You can quote/link anything I post.



Marcel, The Monkey, Forecasting System (December 1, 2003)

Discussion Thread

Posted 11:16 a.m., December 2, 2003 (#21) - Alan Jordan
  Another way of doing a weighted average is to weight them by the inverse of the squarred error of their mean (inverse of their UNreliabilities). This gives the best (in terms of smallest Mean squared error)estimator of the mean.

The standard error of the mean for a batting average is

sqrt(BA*(1-BA)/AB)

Where BA is batting average and AB is # of at bats

and the square of that is simply

BA*(1-BA)/AB

let's call that variance of the error or VE

If you wanted to weight two seasons then you could weight them by the inverse of the variance of their errors. For example if season 1 had BA of 350 and AB of 20, while season 2 had BA of 270 and AB of 300 then the variances would be

VE1=.350*(1-.350)/20=0.011375
W1= 1/VE1 = 1/0.011375 =87.91

VE2=.270*(1-.270)/300=0.000657
W2= 1/VE2 = 1/0.011375 =1522.07

The weighted Mean is
WM1=(1/VE1*BA1 + 1/VE2*BA2)/(1/VE1+1/VE2)=

(1/0.011375*.350 + 1/0.000657*.270)/
(1/0.011375+1/0.000657)=

.274

Now what about weighting season 2 more than season 1?

Assume that you want to weight season 1 by 3 and season 2 by 5 (pick any weights you think appropriate). Then just modify the weights from 1/VE1 and 1/VE to 3/VE1 and 5/VE2. The resulting weighted mean is

WM2=(3/VE1*BA1 + 5/VE2*BA2)/(3/VE1+5/VE2)=.273

These two weighted means both factor in the number of at bats and the amount error proportional to their batting average (proportion). You could simplify these by dropping the BA*(1-BA) term. This would leave you with

WM3=(1/AB1*BA1 + 1/AB2*BA2)/(1/AB1 + 1/AB2) or

WM4=(3/AB1*BA1 + 5/AB2*BA2)/(3/AB1 + 5/AB2)

This gives a weight based on the inverse of AB not the inverse of the square root of AB.


Marcel, The Monkey, Forecasting System (December 1, 2003)

Discussion Thread

Posted 12:29 a.m., December 6, 2003 (#37) - Alan Jordan
  AED, Where do you get this:

AB/(1+x*AB*dy)?

The efficient (minimum variance) weight for an observation when there is heteroskedasticity (systematicaly unequal variances for observations) is 1/VE. Where VE is the variance for that observation.

See Econometric Models & Economic forecasting 3rd ed. by Pindyck & Rubinfeld on page 149-153.

Assuming that we can use the average BA to estimate variance due to the binomial distribution, you get a weight of 1/AB.

Adding in year to year variance, x and a term for lags, dy should get you 1/(AB + x + dy). Where do you get the AB in the numerator and the 1 in the denominator?

Can you expand on this too?

"If player abilities and random errors are distributed normally, in fact, weighting in this way is exactly the same as making a probability analysis."


Marcel, The Monkey, Forecasting System (December 1, 2003)

Discussion Thread

Posted 11:15 p.m., December 7, 2003 (#40) - Alan Jordan
  AED, I screwed up in not one but two places. I got my weights and errors mixed up and I put x+dy instead of x*dy.

if we ignore r*(1-r) then my VE would 1/AB + x*dy. Just taking the reciprical gets:

1/(1/AB + x*dy).

multiplying each term by AB gives you:

AB/(1+AB*x*dy)

Which is exactly what you got, so you were right.

This is the part that I *really*, *really* want to talk about.

"This is pretty straightforward. Paraphrasing Bayes' theorem,...
you get:
x = (m/s^2 + sum_i(xi/Vi)) / (1/s^2+sum_i(1/Vi))"

I have had my doubts about how valid the standard way of regressing a rate like batting average to the mean is. The only way that I've ever seen done is take a bating average and subtract the mean for that year then multiply it by the year to year correlation. This has obvious problems if everyone has different numbers of ABs or PAs, but there is something more insidious that people don't notice. The validity of the approach is based on the idea that the correlation equals the variance of true abilities/ divided by the total variance. The proof goes something like this:

Assume
P1=mu+e1,
P2=mu+e2,
cov(mu,e1)=cov(mu,e2)=cov(e1,e2)=0,
var(e1)=var(e2)=var(e)

where P1 and P2 are performance for year 1 and year 2, and mu represents the true rate or average for the person, and e1 and e2 represent the error for year 1 and 2.

1. r=cov(P1,P2)/std(p1)*std(P2)

2. cov(P1,P2)=cov(mu+e1,mu+e2)=
cov(mu,mu)+ cov(mu,e1) + cov(mu,e2) + cov(e1,e2)=
cov(mu,mu)=
var(mu)

3. std(P1)*std(P2)=
sqrt(var(mu) + var(e1))*sqrt(var(mu) + var(e2))=
sqrt(var(mu) + var(e))*sqrt(var(mu) + var(e))=
var(mu) + var(e)

plugging the end results of 2. and 3. back into 1. you get:

4. var(mu)/(var(mu) + var(e))

which is by definition the ratio of the variance true abilities over total variance. The problem comes when you assume that the process has autoregressive elements to it. If you assume

P1=mu+e1,
P2=mu+u1+e2,
u1 is uncorrelated with mu, e1, and e2

where u1 represents the autoregressive component of the error, then the whole thing falls appart. Cov(P1,P2) still equals var(mu), but the bottom part is:

sqrt(var(mu) + var(e)) * sqrt(var(mu) + var(e) + var(u1))

The bottom part no longer equals total variance. As you can see we have a problem. Using the correlation to forecast isn't a problem, but estimating true ability is.

Your system seems to be valid replacement. Even though you take a few shortcuts such as assuming a uniform rate for all players and assuming that errors are normally distributed, this seems to do the job. I suggest you publish it here or at BTN.


Marcel, The Monkey, Forecasting System (December 1, 2003)

Discussion Thread

Posted 10:55 a.m., December 8, 2003 (#42) - Alan Jordan
  Tango

Rates and odds ratios (along with logits which are ln(odds ratios) are usually a different way of saying the same thing. They each have their advantages and disadvantages in this case. AED's method could be modified to use odds ratios, but I suspect it would be more complicated.

AED's method only requires a program that can run an ARIMA (p=1). This is also called autoregression where residuals are allowed to correlate with themselves as part of the model. Its one of the simplest forms of ARIMA. Actually we need it to include other independent variables which is called a transfer function.

The basic idea of a transfer function goes like this.

1. Perform a regression using a set of independent variables such age and possibly injuries or whatever you think appropriate. Save the residuals.

2. Perform a second regression where the residuals are predicted by the residuals of the year before.

I believe the square of the regression coefficient will give you x for the weight AB/(1+AB*x*dy). The way I described is unbiased in large samples, but inefficient (not the most precise). Maximum likelihood is used to solve regressions 1 & 2 at the same time to get estimates that are efficient, but only unbiased in large samples.

I have a strong hunch that it will be a lot simler to work with means than odds ratios. I'm not even sure what the variance of r/(1-r) is.

Also note that you and Rob Wood were assuming that the only error involved was from the binomial process. AED doesn't make that assumption. He effectively allows a modified version of true ability to move up and down each year. That's probably more realistic. Injuries, one time learning/adjustments to swing, and other temporary changes can't be represented by the binomial part of the error. They are probably swept under that rug, but they don't really belong there.

AED's method is probably a lot more realistic than using the common correlation coefficient.


Marcel, The Monkey, Forecasting System (December 1, 2003)

Discussion Thread

Posted 9:40 p.m., December 8, 2003 (#44) - Alan Jordan
  Tango,

Several months ago you forwarded an email to me from someone who wanted one equation for regression to the mean that would handle various lags and various ABs/PAs. This may well be it.

If this works, then it can be applied to Pinto's model so the park estimates can be regressed (assuming two or more years of data).


Marcel, The Monkey, Forecasting System (December 1, 2003)

Discussion Thread

Posted 12:22 a.m., December 10, 2003 (#47) - Alan Jordan
  AED,

What I posted above the correlation being the ratio of true variance to total variance is based on the true score theory. That's different from what you're doing. One of the problems that I didn't mention in that post is that the true score model predicts that the correlation between 2003 ba and 2002 ba is the same as the correlation between the 2003 and the 2001 ba or for that matter 1990. I can't imagine how that could possibly be the case.

You say that you don't model the random walk part. That's kind of puzzling. Just allowing errors to correlate is a form of modeling them.

BTW are you estimating the autoregressive coefficient or are you setting it equal to 1. Strictly speaking, if the ar coefficient is anything other than 1, it's not a random walk. I've been assuming that you're estimating the ar coefficient rather than setting it to 1.

I think that this is a superior model for doing regression to the mean than the method currently in circulation. I'm also interested in it for uses other than simple forecasting. For example you can do a logistic regression where the dependent variable is whethet the batter gets on base. One set of independent variables is who is batting. The other set is what park is it in. The coefficient for batters give you OBA corrected for park effects in logit form. With a little manipulation they can be transformed from logit to a rate conditional on a certain park, or "average park". I didn't have a way of regressing those coefficients to the mean other than the traditional correlation coefficient. With your method, I can plug in the square of the standard errors where you have the binomial portion. I could also factor out age first. I could also do the same for park factors and pitchers. You get the idea. The point isn't forecasting, but estimating talent/ability.

Again, I recomend that you publish this is as a more robust way of doing regression to the mean.


Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 4:34 p.m., December 6, 2003 (#2) - Alan Jordan
  What's wrong with the way that he presents the intermediate data?


Baseball Musings: Defense Archives (December 5, 2003)

Discussion Thread

Posted 6:04 p.m., December 6, 2003 (#4) - Alan Jordan
  Nevermind, my misunderstanding. I liked it too.


By The Numbers, Dec 7 (December 8, 2003)

Discussion Thread

Posted 8:10 a.m., December 10, 2003 (#1) - Alan Jordan
  Check out the "Accuracy of Preseason Forecast" article. Out of 5 experts, nobody beats the monkey overall. They do well with the American League, but horrible with the National league. It would be interesting to do this with more seasons. Doesn't Diamond Mind post it's preseason predictions?


By The Numbers, Dec 7 (December 8, 2003)

Discussion Thread

Posted 8:53 a.m., December 10, 2003 (#2) - Alan Jordan
  Florida's a good baseball city? I would argue that there are two dimensions of being a good baseball city. One is average attendence controlling for win/standing, past/present, exapansion, p4-95 strike. The other is how sensitive attendence is to winning. In a regression you model these two parts as:

attendence=Team + Team*win + win + expans + strike

The first two terms are what we care about and the rest are just there to be controlled for. We want the first term to be positive and the second term to be near 0.


Building the 2004 Expos (December 8, 2003)

Discussion Thread

Posted 9:28 p.m., December 8, 2003 (#4) - Alan Jordan
  Dlf,

You're absolutely right. The Expos' decision was entirely optimal - for the other 29 teams. Playing for the Expos must be like playing for the occupied France team against the Germans and Japanese, except that they probably won't shoot you for crossing home plate. Yet.


Professor who developed one of computer models for BCS speaks (December 11, 2003)

Discussion Thread

Posted 10:53 p.m., December 11, 2003 (#15) - Alan Jordan
  Massey's does use home vs away. There are two versions of his ratings. The one he posts on his website uses the points scored and allowed. The one he uses for the BCS uses only wins and loses.

http://www.masseyratings.com/theory/massey.htm

Where most people in Baseball use the pyth or some variation, Massey has a function that based on difference of the score/ divided by the some of the scores and adjusted by two constants. He told me how he got the constants, but I'd have to dig up that email. He lists it out for football in this presentation.

http://www.masseyratings.com/theory/uttalk_files/frame.htm

The Colly Matrix on the other hand doesn't use homefield advantage. I asked him about that once and he stated that because homefield advantage varied so much from place to place that it was better not to model it all. His system is pretty simple (probably the only one you could do on a single worksheet of Excel) and doesn't really have a place for it. I'm not sure how you add it in.

The Colley Matrix essentially solves a simulaneous set of equations to derive the rankings. The equations basically represent who beat who with a bayesian prior of 1/2 thrown in. The real beauty is that you don't have to use an iteritive method to solve it because its linear. Bradford-Terry and logistic regression both require an iterative proceedure to solve them.

http://www.colleyrankings.com/matrate.pdf

There is a list of links on Massey's website that includes many other people who rank teams and players from various different sports, including our own AED (look for Dolphin).

http://www.masseyratings.com/index1.htm


Professor who developed one of computer models for BCS speaks (December 11, 2003)

Discussion Thread

Posted 9:09 a.m., December 12, 2003 (#18) - Alan Jordan
  You're right Massey is not using the homefield advantage in his BCS rankings. I don't get that.

How could factoring homefield advantage not be worth the effort. If homefield advantage is an effect then not factoring it in produces biased results. Even if it has variable effect for teams, adding it in to the model should produce less biased results than leaving it out.

If you added a homefield advantage equation to the Colley Matrix, would you still be able to solve it without an iterative proceedure?


Request for statistical assistance (December 17, 2003)

Discussion Thread

Posted 8:51 p.m., December 17, 2003 (#5) - Alan Jordan
  If I understand this correctly, you can still do a logistic regression with the data you have. You can also use linear regression to give you an approximate answer (that will probably be pretty close).

Let each y be the proportion/rate
Let independent variable 1 be the pitcher.
Let independent variable 2 be the catcher.
Let the weight be the denominator of the rate stat.

If you use the linear regression, you can show how much of the rsquare is caused by pitchers and how much by catchers.

As long as you don't specify interaction terms in your model, you get estimates that have some degree of regression to the mean in them (in large samples anyway).

A logistic regression could be run the same way, i.e. without exact matchups.

You're method has two problems.

1. You have no proof that it works (if it even does).
2. Nobody would understand what you're doing.


Request for statistical assistance (December 17, 2003)

Discussion Thread

Posted 1:25 p.m., December 18, 2003 (#7) - Alan Jordan
  Tango,

No I'm not refering to the method in your catcher's article. As far as I can tell it's unbiased. The method I'm talking about is attempting to use SS errors to regress balks.

You're making the assumption that balks has the same variance of error as SS errors. The variance of error for these events should be proportional to the rate of events/PA. If they have different rates then forget it. Even if they have the same rates, I think you still have to make the argument that they are equivalent. Also there are other sources of error such as age of pitcher/catcher that might add to the variance of error. We simply don't know what part of the total variance for balks, etc... is error variance or true variance. The same is true for SS errors.

In short, while it is plausible that your method for regression to the mean might work, depsite the objections that I've made, I wouldn't attempt to use it until it can be shown to work.


Request for statistical assistance (December 17, 2003)

Discussion Thread

Posted 7:49 p.m., December 18, 2003 (#10) - Alan Jordan
  First, regression to the mean using a year to year r works when r represents the ratio of true variance to total variance. In theory if you know the true variance or the error variance you could construct a ratio because total variance can be measured from the data directly.

For example if the total variance for a variable was 10 and we knew that the error variance was 8, then we would get a ratio of 1/5.

Total var=err var + true var
true var=total var - err var
true var=10-8
true var =2

ratio=true var / total var
ratio=2/10
ratio=1/5

We could then use the ratio 1/5 to multiply the differences of each score - mean

if the mean were 50 and score1 was 100 then, using the 1/5 ratio you would get
adj score=mean+(score-mean)*1/5
adj score=50 + (100-50)*1/5
adj score=60

I don't really know of a situation where we would know the error variance. For batting average, we would know the binomial part for battting average and we might be able to figure out the year to year variance.

I thought that was what you were trying to do. If not, nevermind.

"For Pitcher Balks, the observed standard deviation was 2 per 162 GP, while I expected, from a purely random distribition, to be 1 per 162 GP."

Is this right? Do you mean standard deviation or rate? Stan Dev isn't usually expressed in terms of successes per trial. Also observed stan dev is usually larger then an adjusted std dev or var.
I don't get what you mean by purely random distribution, is it binomial, normal or what?

In general, I would recommend that you go back to post #5. It will tell the average effect that catchers have on balks and other events per PA. It will also give a hypothesis test and you can even figure out how much the model's rsquare is soley attributable to all the catchers.

Actually, I'm not 100% sure what you're doing, but then I'm sick today.


Request for statistical assistance (December 17, 2003)

Discussion Thread

Posted 5:25 p.m., December 19, 2003 (#13) - Alan Jordan
  There's a couple of things here so let me do them in order.

1. "I could, and should, have marked the rates and standard deviation of the rates as a per play, but I put it at per 162 GP."

std dev and var are customarily expressed simply as numbers not in units of per play, per AB, dollars or inches. That's what I meant. Stan dev can be expressed in the same units as rates, but they usually aren't. Variance would have to be expressed in those units squared. It confused me that's all.

2. "So, question 1: what is the observed standard deviation of the deltas of these 10,000 flippers? ( I guess it would help if I give you the data.)"

That's easily measured by using the standard deviation formula on all of your deltas and each delta appears to the difference between your score and someone elses.

3. "question 2: what is the expected standard deviation of the deltas, assuming that only luck is expected."

Assuming everyone is using a fair coin then the expected stan dev is
sqrt(p*(1-p)/N)*1000 = sqrt(.5*.5/1000)*1000 = 15.811388.

4. "So, my question #3 is how to get a regression equation for that?"

If the only source of error were binomial then you could calculate true var/total var as

r=(total var-err var)/total var

where err var=(p*(1-p)/N)*N

and
adj score=mean+(score-mean)*r

In your terminology, you would say that we are regressing 1-r or (err var/total var).

5. The example in post #12 is confusing. What are the deltas? Are they the # of heads-500? Are they the # of heads-your # of heads?

If the standard dev of those two groups are known to 0 and .028 respectively, then you would regress them 100% and 99.999% respectively.

I don't see how you get std devs of 32 and 64. I guess you didn't really run this coin flipping scenerio.

If binomial error is the only source of error then I can see how total std dev of 64 and error std dev of 32 will give you .75 which is close to your r of .73.

The problem with using the formula that I listed out in 4 is that it only deals with binomial error and there are other sources of error in baseball such as learning, health, adjustments by opposition, etc... that can't even be modeled like parks, age and opposition can. Some variables like plate umpire should effect balks, strikes, walks, but we don't even bother to factor them in even though they add some amount of error variance. The formula I listed in #4 underestimates the amount you need to regress.

I don't know if this clears anything up, but assuming that the binomial error is the only source of error and that it is uncorrelated to the spread of true talent, then yes there is a formula for regression to the mean. Use it at your own risk in baseball, but it should work in coin flipping experiments.


Request for statistical assistance (December 17, 2003)

Discussion Thread

Posted 4:36 p.m., December 20, 2003 (#17) - Alan Jordan
  Tango,

going back to post #3, where do you get that the standard deviation of random is 1? How do you know it's 1, or is this an assumption?
I guess this also applies to pitchers' balks in post #9. I would feel a little more comfortable if I understood where you got that part.

Also my delta was just p-.5. Since p-q is about double p-.5 when p is near .5, that explains the difference in std dev of error between your's and mine.


Request for statistical assistance (December 17, 2003)

Discussion Thread

Posted 12:37 a.m., December 23, 2003 (#28) - Alan Jordan
  Simulated Data

Pitcher Catcher PA 2B

1 1 2049 182
1 2 5516 64
1 3 9770 220
1 4 4327 18
1 5 6025 728
1 6 3172 27
1 7 3187 90
2 1 9900 129
2 2 697 2
2 3 3171 7
2 4 5785 5
2 5 3970 77
2 6 4110 4
2 7 9546 37
3 1 4198 97
3 2 865 2
3 3 2926 12
3 4 3072 2
3 5 7483 217
3 6 4924 6
3 7 9204 46
4 1 2793 18
4 2 8393 6
4 3 7099 5
4 4 4957 1
4 5 5266 46
4 6 6995 5
4 7 5887 6
5 1 9416 1192
5 2 5438 110
5 3 9967 359
5 4 8649 69
5 5 9512 1742
5 6 1880 23
5 7 695 29
6 1 9892 72
6 2 8733 16
6 3 2465 3
6 4 6545 1
6 5 9491 111
6 6 1865 2
6 7 1555 5
7 1 5629 130
7 2 871 2
7 3 7082 42
7 4 6103 12
7 5 6096 258
7 6 8911 15
7 7 5313 32

Use a logistic regression to model the probability of event given the pitcher and catcher. Use dummy variables for the first 6 pitchers and catchers. If all catcher dummy variables are 0, then the catcher is #7. Ditto for pitchers.

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -4.9170 0.0771 4068.2959 ChiSq

catcher 6 4129.6 <.0001
pitcher 6 4887.4 <.0001

As long as you include an intercept in the model and don't specify a coefficient for each combination of pitcher and catcher, then the estimates are regressed to the grand mean.


Request for statistical assistance (December 17, 2003)

Discussion Thread

Posted 12:50 a.m., December 23, 2003 (#29) - Alan Jordan
  O.K, that didn't post correctly. I can email the details if anybody wants them. The point is that people have already figured out a way to deal with this problem. It looks trying to reinvent the wheel.



A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 9:22 p.m., January 6, 2004 (#9) - Alan Jordan
  AED

When you say Gaussian error function, do you mean the cumulative as in probit (normit)?

Also how do you KNOW that errors are normaly distributed? That strikes me as more of an assumption since they are unobservable.
Have you run some sort of specification tests to assess that?

I'm not sure that the usual t-b or logistic rankings that are done in a non bayesian system are fair comparison to your system or one that allows priors because most of the priors that I've seen push the estimates toward the mean. I think if you ran a probit model they way they usually run the t-b or logistic you would get the same effect.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 10:42 a.m., January 7, 2004 (#13) - Alan Jordan
  AED,

Frey's model uses game wins as a dependent variable and the discussion centers on his model, so that's all I'm interested in at the moment.

Given the graph that I saw
http://www.dolphinsim.com/ratings/info/predicting.html

There's nothing wrong with the specification of the cumulative normal distribution, but what would the graph look like if you used the logistic distribution which is what the b-t model uses? The two distributions are so similar, that you might get approximately the same fit.

Yes, I understand that the calculations are simpler, but I don't see that the logistic is wrong.

Also the b-t models that I've seen don't use priors or any sort of bayesian so I would expect them to overfit the data. My question is whether a b-t (logistic) and a cumulative model (probit) that both don't use any priors would overpredict upsets. Also if they both used priors would either overpredict the number of upsets.

It seems to me that Frey's model shouldn't overpredict major upsets as much as the way I usually use(simple logistic using dummy variables for teams and no priors). My estimates would too extreme and therefore predictions would be too extreme ( I don't care because I'm focusing on rankings, not probability estimations per se). I doubt my model would improve by switching to the probit function.

If I wanted to focus on probability estimation, I would have to add some sort of prior distribution or Stein estimator or Bayesian model averaging or something to tone down the extreme predictions especially in the early part of the season.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 12:36 a.m., January 8, 2004 (#17) - Alan Jordan
  "The line in that graph is not a "fit" to the data, it is the model prediction plotted against the data."

When you overlay actual over predicted values, you are visually inspecting goodness of fit. We could divide the data into groups and do a chi-square test. I know you know how to do tests like that, and the chi-square version is called a goodness of fit test. Why do you object to the word fit?

"Actually, for accurate rankings you should still use a prior. Otherwise you are prone to overranking teams with easier schedules."

I don't do any rankings at all until the teams have played at least 30 games minimum. That's more than two years worth of data for a college football team. By that time even the Tigers have won at least one game and even the Yankees have lost 7 so there's no complete or quasi separation of data points. Teams have generally played each other enough (within the league at least) so that the matrix invertable without resorting to a generalized inverse or setting a team coefficient to 0 (o.k. I have to set one National league team and one American league team coefficient to 0 until interleague play). Also the effect of scheduling (i.e. A good team does better playing against mediocre teams than against half really good and half really bad teams) doesn't hold up when winning percentages are between 25% and 75% and you have a larger sample size (I ran monte carlo simulation at 100). College football has more unforgiving conditions than baseball.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 12:47 a.m., January 8, 2004 (#18) - Alan Jordan
  "One problem that I think would make an accurate analysis of the problem more complicated than is presented is that any team that begins the playoffs is generally not really the same team that started the season."

That's a more interesting problem. Assuming we can come up with a reasonable model for calculating team strength at different points in the season, do we grade a team on its overall average throughout the season or do we grade them on their strength at the end of the season. Both have merit.

Playoffs tend to reward the team that's strongest at the end of the season while getting into the playoffs depends on your average strength across the regular season. If teams have stable strengths across the season, the there's no conflict. Of course we don't really believe that teams don't change strength across the season.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 12:58 a.m., January 8, 2004 (#19) - Alan Jordan
  "I remember seeing a study that suggested that even teams playing identical schedules could have significatly different effective schedule strengths, because of random fluctuations in which opposing starters they faced."

If the problem is only the starting pitchers then you add coefficients for starting pitchers to the model. At this point using priors becomes more important. But's it not catastrophic statistically. To rank teams you treat each pitcher and team combination as if it were a separate team and then take a weighted average of these for each team to get a team strength.

On the other hand injuries are harder to deal with. What if 2 or 3 star players are injured? I don't have a problem with docking that team for their poor performance, but a team that beats them shouldn't get many points either. I'm not sure its big enough to worry about in baseball. In ranking the NFL though, I would definitely have Falcons team with Vick and without Vick.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 8:50 p.m., January 8, 2004 (#21) - Alan Jordan
  "You can't treat the pitcher/team combinations exactly as if they are different teams, since the offense is fairly constant regardless of who is pitching."

I have a set of dummy variables for each team and a set of dummy variables for each pitcher/team. Pitcher/teams that have less than 5 starts get grouped together by team. I have to admit this set up needs priors because there are still quasi separations even at the end of the season and the matrix requires a generalized inverse (some parameters get set to 0).

"Regarding injuries, I have generally found injuries to be less significant than is commonly thought."

I agree and I've never factored injuries into a model. I have left them out of the model because I don't see a simple way of factoring it in without throwing subjectivity into it. If a team is playing better or worse at the beginning or end of the season, an opponent who plays them when they are better should get more points than the team that plays them when they are worse.

Whether a team actually changes strength by any appreciable amount is another question.

"(at the risk of interpreting noise as signal) you would conclude the turnaround happened during the bye, not at Vick's return."

Hypothetically yes, but I'm sure you've studied the effect of bye weeks on performance the next and following weeks. I don't know if you found an effect for the week after the bye, but I doubt you found an effect for the second week after the bye. I did a check a couple of years ago where the dependent variable was win/loss and the independent variable was bye/no. I didn't find anything, but it was only one years worth of data so maybe the sample was too small.

Luck may be enough to explain a difference between 2-10 without Vick and 3-1 with him (Fisher's exact gives a p value .063 chi-square is invalid because of sample size), but the bye week isn't.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 9:54 p.m., January 9, 2004 (#23) - Alan Jordan
  Here is an interesting model. It's done for College Football and is primarily intended to handle wins/losses, but the author puts forward a modification for handling margin of victory. It has priors, but still uses iteratively reweighted least squares (in this case a penalized max like).

AED, I'd be interested in your comments.

http://members.accesstoledo.com/measefam/paper.pdf

He uses a probit. The theoretical justification is that if the process is additive then the Central limit theorem kicks in and forces a gaussian distribution (if I recall correctly, there is a version of CL that says that they don't even have to come from the same distribution).

I think also that if Y is the product of a series of variables then Y should be distributed in an expontential? distribution.

Maybe that's a difference in logistic and probit. Probit assumes Y is a sum and logistic assumes that Y is a product. I don't know.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 1:26 p.m., January 10, 2004 (#26) - Alan Jordan
  I personally love the system because it fits my limitations in mathmatics and programing. My knowledge of calculus and matrix algebra is weak. I can program some basic linear algebra in SAS's IML, but I'm lost the minute you begin some nonlinear optimization for maximum likelihood.

I'm particularly good at milking SAS's procs in such a way as to test or estimate stuff.

This system allows me to use proc logistic to estimate team strengths. That I can do.

* I agree about using all the football teams instead of just lumping the IAA together.

* His prior works by popping pseudo games into the data. If you can show me how to do that with your prior, I'll do it. I just don't see how to do it.

* His priors seem to me to function as a shrinkage estimator like the Stein. The strength estimates should be pushed toward 0. I find that attractive.

* I was wondering especially about his margin of victory method.

* I don't see how to do margin of victory your way in SAS. I was thinking you were using the cumulative normal. Could I use a linear model with your margin of victory?


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 8:09 p.m., January 11, 2004 (#28) - Alan Jordan
  SAS is strictly a frequentist statistical program when it comes to its proceedures. The proceedures for general linear models allow specification of independent variables, dependent variables, weights (or counts) for observations, interaction terms and stepwise parameters. But not priors. One of the selling points of that article is that his method spells out a way getting frequentist software to do bayesian estimation.

Even if his method were exactly like yours or other peoples, he would still get points for translating such a system into frequentist software. It's really more of a teaching journal than a cutting edge statistical journal. It is consequently more readable to people of lower math background like me.

The whole idea of using penalized maximum likelihood or least squares to do bayesian estimation only works if you know how to add an augmented pseudo data set of a correct or plausible form.

His way (augmented pseudo data set) is pretty simple for wins/losses, but I'm not sure how to do it for margin of victory. I understand that his way for wins/losses translates to a beta distribution with mean a/(a+b) where a & b are the pseudo wins and losses that are added to augmented data matrix.


A method for determining the probability that a given team was the true best team in some particular year (January 6, 2004)

Discussion Thread

Posted 9:21 p.m., January 12, 2004 (#30) - Alan Jordan
  Phi(x) can be raised to any arbitrary power. Alpha-1 represents a win added to the data matrix and beta-1 represents a loss. If t is the number of teams then you need a matrix of 2tXt to add to the bottom of the data matrix.

Mease acts as if alpha-1 and beta-1 must be expressed in whole numbers. This isn't necessary. You can assign weights of any rational value.

Where he suggests that ties be entered once and wins and losses be entered twice, one can represent ties by adding a win and a loss for that combination of teams and giving them a weight of 1/2. I don't see how he missed that. The two biggest stat packages (and countless others) allow for weights.

Phi^31 would be represented by adding 30 wins and 30 losses to the data matrix for each combination of teams. Again, only a 2tXt matrix needs to be entered. These pseudo games need to be given a weight of 30.

Basically alpha-1 and beta-1 are the weights.

I think I may have found someone who has written a program in SAS's matrix language to handle the linear bayesian estimation for the margin of victory.

I wonder what the prior would mean if I just added pseudo games where each team played

2 std below
1 std below
equal
1 std above
2 std above

where each game was weighted 12 a piece (to add up to 60 as in the wins/losses model).

I wonder if this would be a valid penalized least squares solution for baseball.



Obscure Rule Flags Students Who Sharply Improve SAT Scores (January 21, 2004)

Discussion Thread

Posted 9:11 p.m., January 22, 2004 (#22) - Alan Jordan
  One of the problems with identifying people as cheaters is what's commonly called the false positive rate, If a test labels someone as a cheater, what are the odds that the person is not a cheater. This is a real problem when you are trying to identify rare events.

Let's say hypothethically that I have a test that identifies cheaters. It correctly classifies cheaters as cheaters 99% of the time and correctly classifies non cheaters as non cheaters 99% of the time. What are the odds that a person who is classified as cheater is actually a cheater?

You need Bayes theorem and an estimate of what percentage are cheating.

If you assume that 1% of testtakers cheat then with the assumptions that I listed above the odds are 50% that the person classified by the test is not a child molester.

Suppose that it's ability to correctly classify cheaters or non cheaters were less than .99 then with an estimated 1% cheaters, it would actually be more likely that a person classified as a cheater was NOT a cheater.


Obscure Rule Flags Students Who Sharply Improve SAT Scores (January 21, 2004)

Discussion Thread

Posted 10:12 a.m., January 23, 2004 (#26) - Alan Jordan
  Michael,

"Yeah, but Alan Jordan where did you get the "child molester" part? I don't think you mean cheater == child molester"

Yes "Child Molester" should have been "cheater".

"At school we actually had some pretty nifty cheat detection programs in computer science where it would test student's computer program submissions and find numerous cases where students had copied other student's work (even from previous years)."

What were the sensitivities and specificities for this and what was the estimated percentage of cheaters?

The answer is A.

Confused,

Quite a screwup wasn't it.


Obscure Rule Flags Students Who Sharply Improve SAT Scores (January 21, 2004)

Discussion Thread

Posted 2:16 p.m., January 24, 2004 (#31) - Alan Jordan
  Actually I figured out how to do that problem a couple of years ago without realizing I was using Bayes rule or theorem. I probably didn't even know what it was at the time.

1. Start off with the number of Green and yellow taxis. In other problems where they give a percent (prevelence) instead of counts, then you can arbitrarily pick a number for the total like 100 or 1,000 and then multiply your prevelence by your arbitrary total.

2. Figure out how many greens (positives) are correctly classified. We have only 5 green taxis and 80% of 5 is 4.

3. Figure out how many yellows are incorrectly classified. We have 95 yellows and 20% of 95 is 19.

4. Divide the number of Correctly classified greens by the sum of correctly classified greens and incorrectly classified yellows. Remember that both incorrectly classified yellows and correctly classified greens will be the ones identified as greens. As J. Cross pointed out above, 4/(4+19) is .174 or 17.4%

In this example the percentage of greens correctly classified is equal to the percentage of yellows correctly classified. In most situations that isn't true. If you are trying to figure out whether a person will get into college based on there SAT, then there are literaly 1599 cut points that you could use to group people into high or low. The higher you pick your cut point, the better the correct classification for the high group, but the worse classification for the low group. Because of that there are two terms for correct classification rate that depend on whether it is a positive or a negative.

Sensitivity - the percent of positives (green taxis) correctly classified.

Specificity - the percent of negatives (yellow taxis) correctly classified.

the probability of subject classified as a positive actually being a positive is

Prevelence*sensitivity / (Prevelence*sensitivity+[1-prevelence]*Specificity)

where prevelence is the percentage of positives (greens).



Clutch Hitters (January 27, 2004)

Discussion Thread

Posted 9:30 p.m., January 27, 2004 (#3) - Alan Jordan
  I'm sure that everyone else reading this could answer this question, but what is LI?


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 10:53 a.m., February 3, 2004 (#10) - Alan Jordan
  It's a very interesting study. It's the first evidence that I've seen that clutch hitting exists.

This may be semantics, but I'm not ready to call someone a choker because their OBA is lower in clutch situations. I would feel more comfortable using the word choker for someone who's OPS is lower in clutch situations. If a batter is swinging for the fences, he's obviously choosing to sacrifice his probability of getting on base in order to increase his probability of getting a homerun. Since we aren't measuring his odds of getting a homerun, we are getting an incomplete measure of his contribution at the plate. What are measuring may just be the tendency for sluggers to swing.

Perhaps we should be looking at whether a runner crossed the plate or the number runners who crossed the plate vs the expected number of runners who cross the plate. I don't see a simple way of testing that.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 10:06 a.m., February 4, 2004 (#41) - Alan Jordan
  I'm with David on this. I'm willing to accept as "Steel Balls" a player who hits BETTER than his non clutch average controlling for pitching. What we have evidence for so far is that hitters hit worse in clutch situations and some don't hit as bad (some may hit better). It may actually be that nobody hits better in clutch than non clutch.

I think AED has found something, I'm just not sure what yet.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 8:58 a.m., February 9, 2004 (#93) - Alan Jordan
  Tango,
I don't know if you got my email or not, but I analyzed the data that you put up for OBA by the five categories of LI. I used logistic regression. There simply wasn't a statistically significant effect for differences in clutch hitting. The data could be explained by differences in player's hitting ability and LI. Adding terms for players hitting differently in LI didn't explain any additional predictive ability.

I can provide details if anyone is interested.


Clutch Hitting: Fact or Fiction? (February 2, 2004)

Discussion Thread

Posted 12:03 p.m., February 9, 2004 (#96) - Alan Jordan
  I'll write it up tonight. Work calls.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 2:08 p.m., February 10, 2004 (#2) - Alan Jordan
  "Actually, it represents the probability - if the null is true - that this analysis would yield the current results."

That's a much better way of stating it. I wrote it up pretty quickly last night, and I wouldn't be surprised if there are whole words missing in places.

"However, this does not refute the possibility that there are a smaller subset of players who do perform better in clutch situations."

No it doesn't. There may be a small subset that are affected or there may just be just be a really small effect. Testing for a small subset is problamatic because statistically its cheating to look at the results to identify who have the biggest differenes in the clutch and then select those batters out for analysis. You could use one years worth of data to select and another years worth of data to test. Actually you could divide your data in two groups any number of ways such odd days versus even days. The trick is to select on one group and test on the other. My gut feeling is that if there were something there, half a million cases would have found it already.

"Also, I admit I didn't read through the past threads on clutch ability, so I may have missed this, but wouldn't clutch hitting (hitting in high leverage situations) be affected by clutch pitching (pitching in high leverage situations)?"

Perhaps in this data set because we are only looking at from the batter's perspective. If you had data at the PA level then you could control for that by factoring in who was pitching and adding terms for clutch pitching. I wasn't able to control for that in this data set.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 2:28 p.m., February 10, 2004 (#4) - Alan Jordan
  Tango, you asked why LI groups have to be factored into the logistic regression.

Excerptfrom email

"What I mean to suggest is that you don't have the
extra term, because you can simply normalize each LI
against the league.

Let's say we have:
Giambi: .400,.410,.420,.430,.520
league: .340,.350,.360,.370,.380

why not have this as:
Giambi,LI0,+.06
Giambi,LI1,+.06
Giambi,LI2,+.06
Giambi,LI3,+.06
Giambi,LI4,+.14

(and do this for all players), and run the regression
based on playerid and LI only?"

The main reason is that we already have a way of factoring in LI through our function prob(Y given X)=exp(X)/(1+exp(x)). By trying adjust the data in the way you are talking about, you are treating the probability of Y as if it were linear with respect to LI, but logistic with respect to batters and clutch hitting. I wouldn't treat probabilities as linear unless I absolutely had to because its a mispecification.

The other reason is that I need to work with events (got on base)and chances (PA). These have to be positive because you can't get on base -5 times out of 20. You posted OBAs and PAs. I calculated # of times on base as round(OBA*PA). I wouldn't have even been able to run a linear regression using the method that you describe. In order to do a linear regression (actually an ANOVA), I would need to have the variance for each group of PA's and the variance of all the data combined in one group (I think that's sufficient). I might actuall need to have all the data PA by PA. That's why I can't analyze you lwts data in the same file.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 4:41 p.m., February 10, 2004 (#6) - Alan Jordan
  "Well, hold on a minute... As I noted in post #104 of the clutch hitting thread, I found a chi^2 of 1.04 for Tango's data, with random S.D. of 0.077 in chi^2. This gives a 30% likelihood that these data (or less consistent data) could have been produced without a clutch factor. You find a 31% chance. Therefore you are confirming my analysis of Tango's data, not contradicting it."

I have to admit that I missed that I missed that post. My understanding from post #43 was that Tango, at least, was convinced that an effect for clutch hitting was detectable in the OBA (not OBSlwts) data from 1999-2002.

"In regards to my study, your statements are misleading. You did not determine that players do not perform differently in the clutch; rather you determined that any clutch factor was sufficiently small that it could not be definitively detected in four years' of data."

I made pains to say that hadn't proven the nonexistence of clutch hitting.

"This says absolutely nothing about whether or not it can be detected at the 2-sigma level using 24 years' of data."

This is where I think you have a valid beef.

"Given that our techniques give the same results on Tango's data, if anything your calculations show that mine are right and thus that the results from my larger data sample (analyzed similarly) are probably also right."

We both found no effect on Tango's OBA data. That's about all we can say.

Had I caught post #104, I would have written it up differently. I agree with you now that it doesn't contradict your findings, it only fails to validate them on a smaller sample.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 4:48 p.m., February 10, 2004 (#7) - Alan Jordan
  "Agreed. I really meant:

x= .400/(1-.400) all divided by .340/(1-.340)
newOBA = x/(x+1)"

That's a lot better and is equivilent to ln(.4/(1-.4)-ln(.34/(1-.34)) which fits directly into logistic function. However, when you are doing hypothesis testing, it's best to avoid doing those adjustments beforehand. By putting them in as properly specified independent variables, you avoid adding bias and imprecision (inefficiency) into the estimates and their variance-covariance matrix. You also get correct degrees of freedom for the hypothesis tests.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 10:44 p.m., February 10, 2004 (#9) - Alan Jordan
  "Alan, it is customary to provide upper limits for non-detections. In other words, how large would the 'clutch effect' have to be for you to detect it? I'd guess that you're only sensitive to clutch if the standard deviation of the clutch talent distribution is 0.015 or higher. Can you quantify this more precisely?"

It's not as customary as it should be. I know how to estimate power and necessary sample size for a single coefficient, but we are testing a group of coefficients and I can't find a formula for that in Hosmer & Lemeshow. I took a look at using the chi-square to estimate sample size necessary but I'm pulling theory out my ass to get an answer. It goes like this, Chi-square is proportional to effect*sample size. Estimate effect from chi-square/sample size and then estimate sample size for the critical alpha for a chi-square with 1,699 DF. The results suggest that with about 2 more years worth of data, I'll have a chi-square significant at the p<=.0001 level. I find that hard to believe.

"Actually I noted my disagreement several times (#65, #69, #104), but that thread seems to have gotten hijacked by win advancement minutae so I fully understand how things get missed..."

There seemed to be a mosquito in that thread that no one could swat.
It was tedious to wade through because of all that net effect of a pa on def and off.

It was unnerving to think that your methodology wasn't working. It's somewhat of a relief to see it didn't find anything on Tango's data.

I still have doubts about clutch hitting because even with 24 years worth of data, you still couldn't find an effect significant at the p<=.0001 level. 24 years of data is the statistical equivelent of an electron microscope. You can see effects that are too small to be of any practical value to anyone. Conceivably we could collect data for another 24 years and not replicate your findings.

That said, I just finished a test run of a logistic regression of 1999-2002 data at the PA level. I used your criteria of clutch 6th inn or later... Anyway it doesn't find clutch hitting or clutch pitching and the average drop in OBA in clutch situations is almost but not entirely explained by pitching. It's a test run and I'll probably post some of the code on fanhome this weekend to verify that I'm doing it right before I do the final run.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 12:44 p.m., February 11, 2004 (#11) - Alan Jordan
  Nobody accused you of making that claim. I just think that with the massive amounts of data involved, we should hold it to a higher standard than .009. I routinely ignore effects of p< .009 at my job and I only have at most 30,000 cases to work with. I sometimes ignore effects of p<.0001 if the increase in the area under the receiver operator curve is less .005. When you have small sample sizes you have to be more liberal. You are probably working with 3 million cases and that should be enough to get us p<.0001.

I have to admit that because it's a high leverage effect, it doesn't have to be as big to affect the outcome of a game. That makes it different from other effects. Still I question whether we can estimate the effect for a player with enough precision or accuracy to justify putting it into a model. Given that we bastardize park factors (express them as linear multiplicative factors instead as odds ratios among other short cuts) I just can't see that this is big enough to warrant inclusion into a model.

The last problem I have is that OBA is a really incomplete measure of clutch hitting. To address it fully, we need to look at all the outcomes of a PA. OBA treats a walk the same as a HR. I think the linear weights is a better way of looking at it. Even better would be a multinomial model. I don't have the computer power at home to estimate that for 24 years worth of data. I would need access to a university's Unix system. It would literally take days, assuming the job didn't explode.

Enough of that. I'll make some changes to the write up so that Tango can replace the one here. It should be fixed.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 2:17 p.m., February 11, 2004 (#13) - Alan Jordan
  By multinomial, I mean multinomial logistic regression. In regular binomial logistic regression you have two outcomes, yes or no, on base or not on base, etc... In multinomial, you can have more than two. The basic difference is that if you have k outcomes, you need k-1 equations and the odds ratios are difined differently. Instead of ratios being defined as p/(1-p), you have p(i=j)/p(i=k). That is one outcome is always defined as the refrence and it's the denominator in the odds ratios. For example if you have 3 outcomes and the probabilities are .3, .5, and .2, then one outcome gets picked to be the references category, let's say the last outcome. So the odds ratios are:

outcome1 = p(i=1)/p(i=k) = .3/.2
outcome2 = p(i=2)/p(i=k) = .5/.2

Models are estimated using the natural log of these odds ratios as dependent variables.

Tango, this should look familiar from the match up method I posted a couple of months ago. This is what I based it on.

I don't know how well the lwts would work as an approximation. By the time you combine those outcomes together in linear combinations, you have to treat them as a continous variable. That's a standard least squares problem which would run (relatively) quickly. Any bias introduced by treating a nonlinear relation as a linear one contaminates the results. The key question is how much, and I don't know the answer to that.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 2:45 p.m., February 11, 2004 (#15) - Alan Jordan
  "For each player/LI category, right?"

Yes

"In terms of least-squares, since the BB is worth less than the HR, would you weight the ln(bb/out) less than ln(hr/out)?"

For the linear regression, you take the lwts weight for a BB and that's literally the value of the dependent variable for that PA. The same goes with HR or any other outcome. The idea is that weight represents the average runs produced by that PA.


Clutch Hitting: Logisitic Regression (PDF) (February 10, 2004)

Discussion Thread

Posted 9:06 p.m., February 11, 2004 (#18) - Alan Jordan
  "If sample size is killing you now, won't it be worse trying to measure clutch changes in triples rates?"

I prefer the term f*cking computationally prohibitive.

I have a proposal. Your method is binomial. However, I believe your method can be made multinomial in the following way. Divide outcomes into:

Singles
xtra base (doubles & triples)
HR
Strike out
Walk
Out from BIP (ground outs, fly outs, double plays, triple plays &
fielder's choice)

Use either strikeouts or out from BIP as the reference category.

Calculate a singles rate as singles/(singles + outs from BIP), then an xtrabase rate as xtrabase/(xtrabase+outs from BIP), etc...

Divide PAs into clutch and nonclutch, and do the same analysis you did with OBA. You will then end up with 5 separate variances estimated with 5 separate chi-squares (of 1 DF).

As long as these chi-squares are independent, you can add them up into one chi-square with 5 degrees of freedom. This should give you a more powerful test. It will also allow you to isolate where any effects are.

If we can form a correlation or covariance matrix of these five effects, we could estimate a model that we could plug into lwts or BSR or something to quantify value. My nonprescibed drugs are wearing off and things become fuzzy about here.



EconPapers: Steven Levitt (February 24, 2004)

Discussion Thread

Posted 11:48 p.m., February 24, 2004 (#4) - Alan Jordan
  It's actually about soccer, at the bottom just under the number 4. heading. What's cool is that the first reference is from John Nash "Beautiful Mind". It's testing Nash's equilibrium (which won him a Nobel prize in Econ) in Soccer.


EconPapers: Steven Levitt (February 24, 2004)

Discussion Thread

Posted 10:08 p.m., February 29, 2004 (#9) - Alan Jordan
  Joe,

Remember that most gamblers use intuition rather than complex models and inside information to make their decisions. There are some quantity of gamblers that can handicap the games better than the bookies, but as long as they are the minority, they're not a problem. It's not a contradiction for there to be a group of gamblers to be better than the bookies, but the bookies to still better than the gamblers on the whole.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 9:42 p.m., March 4, 2004 (#2) - Alan Jordan
  What you are seeing is called a halo effect. That is ratings for different traits are correlated based on an a general emotional preference. It's common in political polls and marketing studies (people who liked Clinton more likely rated him as trustworty and people who liked Quayle (sp?) more likely rated him as intelligent).
Its considered an almost intractable problem. The guy in question would probably Bernie's children were biologicaly possible.

Here are two other points of view on data removal.

1. Don't throw any away. You're asking for opinions and even uninformed opinions have some merit. If they don't then you need to survey only experts or people with minimum competency. Also some people might rate all players very low/high. These people's data might be removed even if their rankings correlate well with the total. Trimming data (deleting cases beyond a cut off) and windsoring (changing cases beyond a cut off to the cut off itself) can cause their own problems if not done right. They reduce sample size and response variance. You can be removing signal as well as noise by removing cases.

2. Remove cases where there is no variance within player ratings. One rule might be if the sum(std(player ratings))<=C then delete. Where C is some constant, possibly 0. If they show very little variance between players and variance within players then they are basically adding a constant to all ratings. In effect they have removed themselves.

Whatever you decide to do, you should analyze the data with and without the change. If there are differences, then note them even if you focus your report on one method.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 2:14 p.m., March 5, 2004 (#8) - Alan Jordan
  If each row has the same score for all but rating and that rating has a score that differs by only one from the rest as in the following 2 examples
1 1 1 1 1 1 2
5 5 5 5 5 5 4

then the variance for that row is .143. That is incredibly low variance. The average row variance for this guy is .067 which is well below .143. Changing sum(std(player ratings))<=C to mean(std(player ratings))<=.143 since people rated different numbers of players, this person gets cut.
You can make the C be any value that you feel comfortable with (.143 is really, really low). That way you can have an easily coded objective measure of crap responses. You can also add other criteria with or statements if you want. Using Bernie's arm rating itself as a criteria for removal tends to make the ratings look like yours.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 4:39 p.m., March 5, 2004 (#19) - Alan Jordan
  Tango,
do you need to analyze anything other than means and standard deviations for this project? Are you planning on doing regressions or ANOVAs with this data? If not, you can treat the morons and the halo effect as random error that cancels itself out in large sample sizes.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 5:56 p.m., March 5, 2004 (#21) - Alan Jordan
  "So, 75% agree that he's a 1. 92.5% agree that he's a 1 or 2. 97.5% agree that he's a 1,2,3. 97.5 as 1,2,3,4."

As long as the data is unimodal (only one peak) then you can use the percentage at the mode as your measure of central tendency and the percentage within 1 as a confidence interval. If it's bimodal then you have a problem.

"In terms of "level of agreement", what if I weight the first number as "4", the second as "3", the third as "2", and the fourth as "1". This will give me a level of agreement of: 87%."

I'm not sure what this would mean to anyone but you. Also the level of agreement would be higher if 3 were the mode and 1 & 5 were only two away instead 4. You could scale that, but it then becomes even harder to explain.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 6:58 p.m., March 20, 2004 (#27) - Alan Jordan
  I don't understand why you would have a term for the minimum number of votes in the equation. It appears that as m approaches infinity, wr=C and as m approaches 0, wr=r. Can we get AED to give us a rational for this equation?

Is it any better than doing a weighted average where we give the overall average a specific weight like 10, 20 or 100 votes?

I don't think I could justify a specific weight, but at least I would understand what I'm doing.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 9:34 p.m., March 20, 2004 (#29) - Alan Jordan
  I get how the weighting works. There are a wide variety of weighting schemes that would be defensible whether they are bayesian or not. What I don't get is how is this bayesian? I didn't think bayesian systems used cut offs for inclusion. If you call something bayesian, then you start off with a set of assumptions and derive an equation that will give you the appropriate answer. I'm just not smart enough to look at this and see how it's bayesian.

I'm not saying that it's inadequit in any way.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 12:51 a.m., March 21, 2004 (#31) - Alan Jordan
  I think I get it. M isn't the minimum number of votes to get into the list. It's the minimum number of votes of those already in.
If there were a cut off number of votes then there could be more or less than 250. What if 350 movies have more than 1,250 votes do they all go into the top *250*?
I think membership in the top 250 is based on the number of votes, and 1,250 is approximately the number of votes that #250 has. Ranking of the 250 is then based on their weighted average. They decided to give 6.9 the arbitrary (as far as I can see) weight of the number of votes of the movie that was ranked the least.

At first glance 1,250 seems way to high, but it has the effect of giving movies with the most positive votes more of a push.


More Help Requested (March 4, 2004)

Discussion Thread

Posted 12:37 p.m., March 22, 2004 (#33) - Alan Jordan
  Maybe you're right.


Park Factors (March 18, 2004)

Discussion Thread

Posted 4:36 p.m., March 18, 2004 (#1) - Alan Jordan
  Is it me or does that link go to another discussion?


Park Factors (March 18, 2004)

Discussion Thread

Posted 11:57 p.m., March 18, 2004 (#9) - Alan Jordan
  I did a linear regression with OBA (each row is a separate PA) with independent variables for the park (hometeam and visiting team for each park for 58 parameters) batter handedness and the interaction of batter handeness and park (another 58 parameters). I estimated the park factors and then redid the regression, this time adding in parameters for hitters, pitchers and defensive team. I estimated the park factors from this second regression and compared the two sets of park factors.

THe top five changes in park factors are

CHA Home Left 0.040
ARI Vis Right 0.039
SFN Home Left -0.037
COL Home Left -0.034
BOS Vis Right 0.030

San Fran/Home/Left handed is number three out of 116. The numbers on the left tell you how much the park factors went up by controlling for who was batting and pitching along with the defensive team. The data was 1999-2002 and there has been no regression to the mean specifically done to the park factors.

Since it's a linear model the park factors themselves are additive instead on multiplicative or odds ratio based. I've been playing around with using the linear model to approximate the logistic (odds ratio) and it turns out that with baseball data the hypothesis tests and predicted values match pretty damn well and the linear runs sooooo much quicker.

In summary, San Fran's home lefthanded park factor seems to be inflated by .037 which supports Tango's argument to some degree.


Park Factors (March 18, 2004)

Discussion Thread

Posted 9:23 p.m., March 19, 2004 (#15) - Alan Jordan
  Thanks for the offer Tango, and I may take you up on it in a week or so.
The problem with "doing it right" is that it takes so long to run. It can take 3 or 4 hours to estimate one model that contains batters and pitchers. On DIPs models where you need to add in the defensive team on top of that, it can take over 12 hours to run. That means you can spend a week doing a single specific hypotheis or set of models.
Applying park factors on a PA level like I'm doing, requires knowing the mix of PAs by park. That can known exactly if you have the data, but can only be approximated for the future. That adds a layer of noise.
There is also the problem of players who have rates of 0 or 1. When these are present in logistic or probit models, they give infinite paramaters and contaminate the hypothesis tests. Such players can be dropped or added together with other players with limited PAs ... or you can use some kind of bayesian model with priors that will take literally weeks to estimate on a home computer. If you had asked me this summer whether using odds ratios was important, I would have said absolutely yes, but I am slowly becoming disabused of that idea. I think that the small differences in talent are mostly obscurred by the fog of chance.


Copyright notice

Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.

If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.